SUPPORT VECTOR MACHINES
1. SVM with various kernels
The SVM command is in package called e1071.
> install.packages("e1071");
> library(e1071)
Let’s use support vector machines to classify cars into Economy and Consuming classes.
> ECO = ifelse( mpg > 22.75, "Economy", "Consuming" )
> Color = ifelse( mpg > 22.75, "green", "red" )
> plot( weight, horsepower, lwd=3, col=Color )
The two classes cannot be separated by a hyperplane, but the SVM method is surely applicable.
> S = svm( ECO ~ weight + horsepower, data=Auto, kernel = "linear" )
Error in svm.default(x, y, scale = scale, ..., na.action = na.action) :
Need numeric dependent variable for regression.
Error? There are other, unused variables in dataset Auto that prevent R from doing this SVM analysis. We’ll create a reduced dataset.
> d = data.frame(ECO, weight, horsepower)
> S = svm( ECO ~ weight + horsepower, data=d, kernel="linear" )
> summary(S)
Parameters:
SVM-Type: C-classification
SVM-Kernel: linear
cost: 1
gamma: 0.5
Number of Support Vectors: 120
( 60 60 )
So, there are 120 points violating the separating hyperplane or the margin, 60 in each class.
> plot(S, data=Auto)
Error in plot.svm(S, data = Auto) : missing formula.
Same story. We need to use a reduced dataset that contains only the needed variables.
> plot(S, data=d)
This is the final classification with a linear kernel and therefore, a linear boundary. Support vectors are marked as “x”, other points as “o”.
We can look at other types of kernels and boundaries – polynomial, radial, and sigmoid.
> S = svm( ECO ~ weight + horsepower, data=d, kernel="polynomial" )
> summary(S); plot(S,d)
Number of Support Vectors: 176
> S = svm( ECO ~ weight + horsepower, data=d, kernel="radial" )
> summary(S); plot(S,d)
Number of Support Vectors: 121
> S = svm( ECO ~ weight + horsepower, data=d, kernel="sigmoid" )
> summary(S); plot(S,d)
Number of Support Vectors: 74
Adding more variables should give a better fit – to the training data.
> S = svm( factor(ECO) ~ weight + horsepower + displacement + cylinders, data=Auto, kernel="linear" )
> summary(S)
Number of Support Vectors: 99
We can identify the support vectors:
> S$index
[1] 16 17 18 25 33 45 46 48 60 61 71 76 77 78 80 100 107 108 109
[20] 110 111 112 113 119 120 123 153 154 162 173 178 199 206 208 209 210 240 241
[39] 242 253 258 262 269 273 274 275 280 281 384 24 31 49 84 101 114 122 131
[58] 149 170 177 179 192 205 218 233 266 270 271 296 297 298 299 305 306 313 314
[77] 318 322 326 327 331 337 338 353 355 356 357 358 360 363 365 368 369 375 381
[96] 382 383 385 387
> Auto[S$index,]
mpg cylinders displacement horsepower weight acceleration year origin
16 22.0 6 198 95 2833 15.5 70 1
17 18.0 6 199 97 2774 15.5 70 1
18 21.0 6 200 85 2587 16.0 70 1
25 21.0 6 199 90 2648 15.0 70 1
< truncated >
2. Tuning and cross-validation
The “cost” option specifies the cost of violating the margin. We can try costs 0.001, 0.01, 0.1, 1, 10, 100, 1000:
> Stuned = tune( svm, ECO ~ weight + horsepower, data=d, kernel="linear", ranges=list(cost=10^seq(-3,3)) )
> summary(Stuned)
- sampling method: 10-fold cross validation
- best parameters:
cost
0.1
- best performance: 0.1173718
- Detailed performance results:
cost error dispersion
1 1e-03 0.2478205 0.10663023
2 1e-02 0.1432051 0.05485355
3 1e-01 0.1173718 0.04208311 # This cost yielded the lowest cross-validation error of classification.
4 1e+00 0.1326282 0.04461101
5 1e+01 0.1351923 0.04819639
6 1e+02 0.1351923 0.04819639
7 1e+03 0.1351923 0.04819639
We can also find the optimal kernel.
> Stuned = tune( svm, ECO ~ weight + horsepower, data=d, ranges=list(cost=10^seq(-3,3), kernel=c("linear","polynomial","radial","sigmoid")) )
> summary(Stuned)
Parameter tuning of ‘svm’:
- sampling method: 10-fold cross validation
- best parameters:
cost kernel
0.1 sigmoid
- best performance: 0.1046154
- Detailed performance results:
cost kernel error dispersion
1 1e-03 linear 0.2164744 0.10501351
2 1e-02 linear 0.1326282 0.05074006
3 1e-01 linear 0.1096154 0.04330918
4 1e+00 linear 0.1172436 0.03813782
5 1e+01 linear 0.1223718 0.04775672
6 1e+02 linear 0.1223718 0.04775672
7 1e+03 linear 0.1223718 0.04775672
8 1e-03 polynomial 0.3720513 0.08274072
9 1e-02 polynomial 0.2601282 0.06438244
10 1e-01 polynomial 0.1987821 0.07443903
11 1e+00 polynomial 0.1784615 0.05328633
12 1e+01 polynomial 0.1580769 0.04909157
13 1e+02 polynomial 0.1555128 0.04999836
14 1e+03 polynomial 0.1504487 0.04722372
15 1e-03 radial 0.5816026 0.05687780
16 1e-02 radial 0.1301282 0.05190241
17 1e-01 radial 0.1198077 0.05104329
18 1e+00 radial 0.1223718 0.04118608
19 1e+01 radial 0.1096795 0.04835338
20 1e+02 radial 0.1198718 0.04184981
21 1e+03 radial 0.1146795 0.04354410
22 1e-03 sigmoid 0.5816026 0.05687780
23 1e-02 sigmoid 0.1530769 0.04517581
24 1e-01 sigmoid 0.1046154 0.03711533 # The best kernel and cost.
25 1e+00 sigmoid 0.1173718 0.04715638
26 1e+01 sigmoid 0.1530769 0.06159616
27 1e+02 sigmoid 0.1582051 0.06489946
28 1e+03 sigmoid 0.1582051 0.06489946
> Soptimal = svm( ECO ~ weight + horsepower, data=d, cost=0.1, kernel="sigmoid" )
> summary(Soptimal); plot(Soptimal,data=d)
Parameters:
SVM-Type: C-classification
SVM-Kernel: sigmoid
cost: 0.1
gamma: 0.5
Number of Support Vectors: 164 # We know that more support vectors imply a lower variance
( 82 82 )
Number of Classes: 2
Levels: Consuming Economy
Let’s use the validation set method to estimate the classification rate of this optimal SVM.
> n = length(mpg); Z = sample(n,n/2)
> Strain = svm( ECO ~ weight + horsepower, data=d[Z,], cost=0.1, kernel="sigmoid" )
> Yhat = predict( Strain, data=d[-Z,] )
> table( Yhat, ECO[Z] )
Yhat Consuming Economy
Consuming 82 9
Economy 17 88
> table( Yhat, ECO[Z] )
> mean( Yhat==ECO[Z] )
[1] 0.8673469
3. More than two classes
Let’s create more categories of ECO. The same tool svm( ) can handle multiple classes.
> summary(mpg)
Min. 1st Qu. Median Mean 3rd Qu. Max.
9.00 17.00 22.75 23.45 29.00 46.60
> ECO4 = rep("Economy",n)
> ECO4[mpg < 29] = "Good"
> ECO4[mpg < 22.75] = "OK"
> ECO4[mpg < 17] = "Consuming"
> table(ECO4)
ECO4
Consuming Economy Good OK
92 103 93 104
> S4 = svm( ECO4 ~ weight + horsepower, data=d, cost=0.1, kernel="sigmoid" )
Error in svm.default(x, y, scale = scale, ..., na.action = na.action) :
Need numeric dependent variable for regression.
R was trying to do regression SVM but realized that ECO4 is not numerical. We can direct R to do classification by replacing ECO4 with factor(ECO4).
> S4 = svm( factor(ECO4) ~ weight + horsepower, data=d, cost=0.1, kernel="sigmoid" )
> plot(S4, data=d)
> Yhat = predict( S4, data.frame(Auto) )
> table( Yhat, ECO4 )
ECO4
Yhat Consuming Economy Good OK
Consuming 88 0 2 33
Economy 0 96 58 15
Good 0 2 9 5
OK 4 5 24 51
> mean( Yhat == ECO4 )
[1] 0.622449
It’s more difficult to predict finer classes correctly